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Nowadays, there are many beautiful recitation of Al-Quran available. 
Quranic recitation has its own characteristics, and the problem to identify the 
reciter is similar to the speaker recognition/identification problem. The 
objective of this paper is to develop Quran reciter identification system using 
Mel-frequency Cepstral Coefficient (MFCC) and Gaussian Mixture Model 
(GMM). In this paper, a database of five Quranic reciters is developed and 
used in training and testing phases. We carefully randomized the database 
from various surah in the Quran so that the proposed system will not prone to 
the recited verses but only to the reciter. Around 15 Quranic audio samples 
from 5 reciters were collected and randomized, in which 10 samples were 
used for training the GMM and 5 samples were used for testing. Results 
showed that our proposed system has 100% recognition rate for the five 


reciters tested. Even when tested with unknown samples, the proposed 
system is able to reject it. 
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1. INTRODUCTION 

Al-Quran is the holy book of Muslims which is written and recited in Arabic language. 
Interestingly, Al-Quran is the most popular and most recited book of all time [1], [2]. Muslim should try their 
best to avoid mistakes in reciting the Quran, such as reciting rules (tajwid), missing words, verses, 
misreading vowel pronounciations, punctuations, and accents [3]. Recitation should follow the rules of 
pronounciation, intonation, and caesuras established by the the Islamic prophet Muhammad (PBUH). The 
rules and guidance to read Quran is propagated from the prophet Muhammad until the Quranic reciter 
through a verified chain of transmission (sanad). Many non-Arabic people studied and learnt Al-Quran by 
listening to the well known Quranic reciters (gari). Although each reciter recited the same Quranic verses, 
but it has differences due to their unique voice and characteristics. To identify the Quranic reciter, the 
problem is similar to the speaker recognition [4]. 

Typical speaker recognition system includes pre-processing, feature extraction, and classification 
[5]. Many features and classifiers have been used in the speaker recognition research. Audio features such as 
Mel-frequency Cepstral Coefficients [4], [6], linear-frequency cepstral coefficients (LFCC), and linear 
predictive coefficients (LPC). LFCC is similar to MFCC except that their frequencies is not warped by a non- 
linear frequency scale and it has been found that LFCC performed better than MFCC in female trials [7]. 
As stated by [8] is the most commonly used features in speaker recognition. 

Given a set of feature vectors, each speaker model will be built so that a vector from the same 
speaker has higher probability compared to any other models. Several classifers have been used, such as k- 
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nearest neighbors, vector quantization [6], hidden Markov model (HMM) [9], Gaussian mixture model 
(GMM) [10], artificial neural network [4], and deep neural network (DNN) [11]. Of the various classifiers 
available, in this research we selected GMM as our baseline for speaker recognition. 

Although many researches have been conducted on speaker recognition, but very limited are targeted on the 
Quranic reciter recognition. Recent research conducted by [12] stated that the Quranic recitation has different 
characteristics compared to the English spoken language. The Quranic recitation is predominantly voiced 
speech, in which it could potentially be exploited to build more efficient speaker models. Therefore, the 
objective of this reseach is to develop a Quranic recitation identification using MFCC and GMM, and to 
evaluate its performance. The rest of the paper is organized as follows: Section 2 describes the typical 
components in a speaker recognition system. Section 3 explains the proposed Quranic reciter identification 
system. Section 4 evaluates its performance in terms of recognition rate, while Section 5 concludes 
this paper. 


2. SPEAKER RECOGNITION 

The flow chart of basic model for recognition speaker as shown in Figure 1. First, the audio signal is 
going through the front-end processing, in which the features that could uniquely represent the speaker 
information are extracted. The short-time spectral is the most-frequently used typed of features [5]. 
The front-end may also include pre-processing modules, such as voice activity detection to remove silence 
from the input, or a channel compensation module to normalize the effect of the recording channel [5], [13]. 


Feature 
Extraction 


Pre- 
processing 


Classification 
Recognized 
Input Speech speaker 
from 1toN 


speakers Databasse 


Figure 1. Typical Speaker Recognition System 


Currently, there are many methods that can be used to verify a speaker identity and the most two 
known methods are linear predictive coding (LPC) and Mel frequency cepstrum (MFCC) [4], [6]. However, 
in this paper MFCC methods is choosen as the feature extraction since the system give higher accurancy. 
MFCC is the most popular method due to it is easy to moderate and can handle multiple speakers 
or multiple languages. 

A vector of features acquired from the previous step is then compared agains a set of speaker 
models. The identity of the test speaker is associated with the identity of the highest scoring model. A 
speaker model is a statistical model that represents speaker-dependent information, and can be used to predict 
new data. Any modeling techniques can be used, but the most popular techniques are: clustering, hidden 
Markov model, artificial neural network, and Gaussian mixture model [4], [5], [14]. In this research, we used 
GMM as it is one of the most effective techniques in speaker recognition [5]. GMM used estimation 
maximum log-likelihood algorithm to find the pattern matching and is able to form smooth approximation for 
arbitrarily shaped densities. 


3. PROPOSED QURANIC RECITER IDENTIFICATION SYSTEM 
Figure 2 illustrates our porposed system for Quranic recitation identification. We used MFCC for 
feature extraction and GMM for classifier due to its popularity and effectiveness for speaker recognition. 


3.1. Mel-Frequency Cepstral Coefficients (MFCCs) 

MFCCs use a non-linear frequency scale, i.e. mel scale, based on the auditory perception. A mel is a 
unit of measure of perceived pitch or frequency of a tone. Equation (1) can be used to convert frequency 
scale to mel scale. 


mel =nimi + Zee) (1) 
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where fme is the frequency in mels and fy,is the normal frequency in Hz. MFCCs are often calculated using 


a filter bank of M filters, in which each filter has a triangular shape and is spaced uniformly on the mel scale 
as shown in Equation (2). 


i k < f[m-1] 
k- f|m-1 
I<k< 
Pn SA 5 
flm +1]- ffn] 
0 k> flm+1] 
where m =0,1,...,M —1. The log-energy mel spectrum is then calculated as follows: 
N-I 
sl- Sxl] m=0,1,...,.M -1 (3) 
k=0 


where X [x] is the discrete Fourier transform (DFT) of a speech input x[n]. 

Although traditional cepstrum uses inverse discrete Fourier transform (IDFT), mel frequency 
cepstrum is normally implemented using discrete cosine transform (DCT) since S$ Pn] is even as shown in 
Equation (4), as follows: 


M- 
X 1)m 
xin|= > S|m|cos|| m+— |— n =0,l,...,M —1 4 
[n] 2 a ( z| (4) 
Typically, the number of filters M ranges from 20 to 40, and the number of kept coefficients is 13. 


Some research reported that the performance of speech recognition and speaker identification systems 
reached peak with 32-35 filters [8]. 
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Figure 2. The proposed Quranic reciter identification system 


3.2. Gaussian Mixture Model (GMM) 
GMM provides a probabilistic model of a speaker’s voice. A Gaussian mixture distribution is a 
weighted sum of M densities: 


plž | 4)= Yo (x) (5) 
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where X is a D-dimensional vector, b; (x) is the i-th component density, and p; is the weight of the i-th 
M 

component. The mixture weights satisfy > Pp; =1. Each mixture component is a D-variate 
i=l 

Gaussian density function: 


à 1 Ly. -a fa a 
bi) = omer - 6-a 2-a) © 
(27)?|z,|? 


where 4; is the mean vector, and È}, is the covariance matrix. A GMM is characterized by the mean vector, 
covariance matrix, and weight from all components. So, we can represent it in a compact notation as follows: 


Wal poe et). SA a 


3.3. Maximum Likelihood Estimation 
Given a set of training samples X, the most popular method to train a GMM is maximun likelihood 
estimation. The likelihood of a GMM can be defined as: 


p(X |4)= [Tp |4) (8) 


t=1 


Maximum likelihood parameters are normally estimated using the expectation maximization (EM) 
algorithm. Among a set of speakers characterized by parameters 1,,/),...,4,, a GMM system makes it 
prediction by returning the speaker that maximizes the a posteriori probability given an utterance X as 
follows: 


X | A JPA) 


A P 
S = arg MAX j<p<p P(X |4)= l P(X) » 


If prior probabilities of all speakers are equal, P(A, )=4,Vk, since P(X ) is the same for all 


speakers, we can rewrite Equation (10) as follows: 


T 
§ = arg MaX ieren > log phx, | Ax) (10) 


t=l 


4. RESULTS AND DISCUSSION 

In this section, we will present the experimental setup and Quranic audio database, experiment with 
training samples, experiment with testing samples, experiment with unknown samples, and recognition rate 
evaluation. We carefully randomized the audio database so that the proposed system is not prone to the 
recited verses but to the reciter voice only. 


4.1. Experimental Setup and Quranic Audio Database 

A high performance system was used for processing, i.e. a multicore system with Intel Core i7 6700 
K 4.00 GHz (4 cores with 8 threads), 32 GBytes RAM, 256 GBytes SSD and 2 TBytes hard disk, installed 
with the latest version of Windows 10 64-bits operating system and Matlab 2017b with Signal Processing and 
Neural Network Toolboxes. 

The audio database in this research were downloaded in form of MP3 from the internet or originally 
from CD audio for five reciters, i.e. Abdul Basit (Reciter A), Abdurahman As-Sudais (Reciter B), 
Saud Ash-Shuraym (Reciter C), Sheikh Ali Abdulrahman (Reciter D) and Sheikh Said al-Ghamdi (Reciter 
E). Using Audacity software, the audio files were converted from MP3 to WAV files with 8000Hz sampling 
frequency, mono, and were cut into duration of 60 seconds for each sample. The database is divided into two 
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parts, including testing and training samples. The training samples starting from samples 01 until 10, while 
testing samples from number 11 until 15. Table 1 shows the Quranic audio database for five reciters. 


Table 1. Quranic Audio Database for Five Reciters 


Samples Reciter A Reciter B Reciter C Reciter D Reciter E 
01 Al-Fatihah Al-Munafiqun Al-Baqarah An-Naba’ Al-Fatihah 
02 Al-Baqarah At-Talaq Al-Fajr An-Nazi’at Al-Lail 
03 An-Nisa’ At-Tahrim Al-Ghasyiyah ‘Abasa At-Takwir 
04 Al-Imran Al-Mursalat Al-Buruj At-Takwir Al-Infitar 
05 Al-Ma’idah Al-Insan Al-Insyiqaq Al-Infitar Al-Mutaffifin 
06 Al-An’am Al-Muddaththir Al-Infitar Al-Mumtahanah Al-Insyiqaq 
07 Ar-Ra’d Al-Muzzamil ‘Abasa Al-Insyiqaq Al-Buruj 
08 Ibrahim Jin An-Nazi’at Al-Buruj At-Tariq 
09 Al-Hijr Nuh Nuh At-Tariq Nuh 
10 Maryam Al-Haqqah Al-Haqqah Al-A’la ‘Abasa 
11 Al-Ghashiyah Al-Qalam Al-Qalam Al-Ghashiyah Al-Muzammil 
12 Al-Kahfi At-Taghabun At-Tahrim Al-Fajr Al-Qiyamah 
13 Yasin Al-Jumu’ah At-Talaq Al-Balad Al-Insan 
14 Al-Mulk As-Saff At-Taghabun Ad-Dhuha An-Naba’ 
15 Yunus Al-Mulk Al-Mulk Ash-Shams An-Nazi’at 


4.2. Experiment with the Training Samples 
In this experiment, the samples 01 until 10 for each reciter were used to train the GMM. The same 


samples were then were used to evaluate the recognition rate of the trained GMM. The log-likelihood was 
calculated for each reciter and each training samples, in which the highest likelihood as selected as the 
recognized reciter. Table 2 shows the recognition rate of the training samples. It shows that each samples for 
each reciters were identified correctly. It means that the recognition rate for the training phase was 100%. 


Table 2. Recognition Rate of Training Samples 


Reciter Samples 
01 02 03 04 05 06 07 08 09 10 
A Y Y Y Y Y Y Y Y Y Y 
B Y Y Y Y Y Y Y Y Y Y 
C Y Y Y Y Y Y Y Y Y Y 
D Y Y Y Y Y Y Y Y Y Y 
E Y Y Y Y Y Y Y Y Y Y 


*Y = matched, N= unmatched 


4.3. Experiment with the Testing Samples 

In this experiment, the previous trained GMM in section 4.2 was used to test different samples. 
The samples 11 until 15 were used to evaluate the recognition rate of the trained GMM when tested with 
different samples from the same reciters. The log-likelihood was calculated for each reciter and each testing 
samples, in which the highest likelihood as selected as the recognized reciter. Table 3 shows the recognition 
rate of the testing samples. It shows that each samples for each reciters were identified correctly. 
It means that the recognition rate for the testing phase was 100% as well. 


Table 3. Recognition Rate of Testing Samples 


: Sampels 
Reciter 4 2 13 #14 15 
A Y Y y Y Y 
B Y Y Yy Y Y 
c Y yY Yy Y Y 
D Y Y Yy Y Y 
E Y Y Y Y Y 


*Y = matched, N= unmatched 


4.4. Experiment with Unknown Samples 
The last experiment is conducted to evaluate the proposed system whether it can detect unknown 
speaker which is not in the database. The database are collected for five reciter only which are Abdul Basit, 
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Abdurahman As-Sudais, Saud Ash-Shuraym, Sheikh Ali Abdulrahman and Sheikh Said al-Ghamdi. For this 
purpse, one unknown reciter named Fatih is tested using the trained GMM and the training samples. The maximum 
estimation log-likelihood of unknown speaker does not matched with any parameter of the reciter in the 
database. Because of that, the result shows that Fatih is recognized as an unknown speaker as sown in 


Figure 4. 


Table 4. Result of testing with unknown reciter 


Speaker Samples 
01 02 03 04 05 06 07 08 09 10 
A N N N N N N N N N N 
B N N N N N N N N N N 
C N N N N N N N N N N 
D N N N N N N N N N N 
E N N N N N N N N N N 


*Y = matched, N= unmatched 


4.5. Recognition Rate Evaluation 
It can be concluded that, three of the experiments are successfully conducted. This results showed 


that the proposed system was able to verify and identify the trained reciters. Although the reciter was 
randomly reciting different Surah in each samples, but the system still can recognize the pattern of the 
reciter’s recitation. Furthemore, the proposed system was also able to reject the unknown reciter tested. 
Table 5 show the recognition rate for each experiments. This result is better than the result reported in [4], 
in which they obtained around 91% accuracy using MFCC and ANN. Better results achieved in this paper 
could be due to the use of GMM instead of ANN, and the randomize Quranic audio database so that the 
proposed system is not prone to the recited Quranic verses but able to distinguish the characteristics of 


individual reciter. 


Table 5. Performance of the system with different samples 


Experiment Recognition Rate 

Testing with training samples 100% match, 0% mismatch 
Testing with testing samples 100% match, 0% mismatch 
Testing with unknown reciter 0% match, 100% mismatch 


5. CONCLUSIONS AND FUTURE WORKS 
This paper has presented the development of Quranic reciter identification system using MFCC and 


GMM. MFCC was selected as the features, while GMM is selected as the classifer. First, we build a Quranic 
audio database from five reciters, in which they recite different surah from Al-Quran. Altogether, there are 15 
samples collected for each reciter, in which 10 samples were used to train the GMM and 5 samples were used 
for testing. Furthermore, we use another unknown reciter to evaluate the performance of the proposed 
system. Results showed that our proposed system achieved 100% accuracy in the training and testing phase. 
The unknown samples were also achieved 100% rejection rate. Further research includes variation of shorter 
utterance of the recited Quranic verses, different reciters, different features, or different classifier. 
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